129 research outputs found
LiveSketch: Query Perturbations for Guided Sketch-based Visual Search
LiveSketch is a novel algorithm for searching large image collections using
hand-sketched queries. LiveSketch tackles the inherent ambiguity of sketch
search by creating visual suggestions that augment the query as it is drawn,
making query specification an iterative rather than one-shot process that helps
disambiguate users' search intent. Our technical contributions are: a triplet
convnet architecture that incorporates an RNN based variational autoencoder to
search for images using vector (stroke-based) queries; real-time clustering to
identify likely search intents (and so, targets within the search embedding);
and the use of backpropagation from those targets to perturb the input stroke
sequence, so suggesting alterations to the query in order to guide the search.
We show improvements in accuracy and time-to-task over contemporary baselines
using a 67M image corpus.Comment: Accepted to CVPR 201
Special section on Non-Photorealistic Animation and Rendering (NPAR) 2010
International audienceEditoria
Robust Synthesis of Adversarial Visual Examples Using a Deep Image Prior
We present a novel method for generating robust adversarial image examples
building upon the recent `deep image prior' (DIP) that exploits convolutional
network architectures to enforce plausible texture in image synthesis.
Adversarial images are commonly generated by perturbing images to introduce
high frequency noise that induces image misclassification, but that is fragile
to subsequent digital manipulation of the image. We show that using DIP to
reconstruct an image under adversarial constraint induces perturbations that
are more robust to affine deformation, whilst remaining visually imperceptible.
Furthermore we show that our DIP approach can also be adapted to produce local
adversarial patches (`adversarial stickers'). We demonstrate robust adversarial
examples over a broad gamut of images and object classes drawn from the
ImageNet dataset.Comment: Accepted to BMVC 201
Audio-Visual Contrastive Learning with Temporal Self-Supervision
We propose a self-supervised learning approach for videos that learns
representations of both the RGB frames and the accompanying audio without human
supervision. In contrast to images that capture the static scene appearance,
videos also contain sound and temporal scene dynamics. To leverage the temporal
and aural dimension inherent to videos, our method extends temporal
self-supervision to the audio-visual setting and integrates it with multi-modal
contrastive objectives. As temporal self-supervision, we pose playback speed
and direction recognition in both modalities and propose intra- and inter-modal
temporal ordering tasks. Furthermore, we design a novel contrastive objective
in which the usual pairs are supplemented with additional sample-dependent
positives and negatives sampled from the evolving feature space. In our model,
we apply such losses among video clips and between videos and their temporally
corresponding audio clips. We verify our model design in extensive ablation
experiments and evaluate the video and audio representations in transfer
experiments to action recognition and retrieval on UCF101 and HMBD51, audio
classification on ESC50, and robust video fingerprinting on VGG-Sound, with
state-of-the-art results.Comment: AAAI-2
Higher level techniques for the artistic rendering of images and video
EThOS - Electronic Theses Online ServiceGBUnited Kingdo
PARASOL: Parametric Style Control for Diffusion Image Synthesis
We propose PARASOL, a multi-modal synthesis model that enables disentangled,
parametric control of the visual style of the image by jointly conditioning
synthesis on both content and a fine-grained visual style embedding. We train a
latent diffusion model (LDM) using specific losses for each modality and adapt
the classifier-free guidance for encouraging disentangled control over
independent content and style modalities at inference time. We leverage
auxiliary semantic and style-based search to create training triplets for
supervision of the LDM, ensuring complementarity of content and style cues.
PARASOL shows promise for enabling nuanced control over visual style in
diffusion models for image creation and stylization, as well as generative
search where text-based search results may be adapted to more closely match
user intent by interpolating both content and style descriptors.Comment: Added Appendi
- …